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The task of accurately locating fluid phase boundaries by means of computer simulation is ham- 
pered by problems associated with sampling both coexisting phases in a single simulation run. We 
explain the physical background to these problems and describe how they can be tackled using a 
synthesis of biased Monte Carlo sampling and histogram extrapolation methods, married to a stan- 
dard fluid simulation algorithm. It is demonstrated that the combined approach provides a powerful 
method for tracing fluid phase boundaries. 



I. INTRODUCTION 

One of the fundamental tasks of statistical mechanics is 
to forge the link between the microscopic (atomic-scale) 
description of a particular substance and its equilibrium 
macroscopic (thermodynamic) properties. Typically the 
former is prescribed in terms of a model, ie. a speci- 
fication of the constituent molecules and their mutual 
interactions. Given this, the challenge is to derive the as- 
sociated macroscopic properties — quantities such as the 
compressibility, heat capacity and, above all, the phase 
behaviour i.e. the conditions under which the substance 
forms a solid, liquid or gas. Computer simulation allows 
one to do this. 

In many respects, a simulation can be viewed as an 
experiment performed not on a real substance, but on 
a model system stored in a computer's memory. As in 
real experiments, one has to prepare a properly equili- 
brated sample under the desired thermodynamic condi- 
tions; and, just as in real experiments, one can measure 
the physical properties of that substance, perhaps lead- 
ing to new discoveries as a result! But, a simulation can 
have important advantages over a real experiment. Since 
one is dealing with a numerical model, substances can 
be studied that are too expensive, too complicated or 
too dangerous to be tackled by real experiment. Fur- 
thermore, since the simulator has access to full and com- 
plete information about the state of the simulated sys- 
tem, there are fewer restrictions on just which properties 
can measured. Accordingly, information and insight can 
be gleaned from a simulation which would not readily be 
obtainable by experimental means 

However simulations do have their limitations. The 
chief drawbacks are constraints on computer speed and 
memory; most contemporary computers can deal only 
with systems comprising a few thousand particles -many 
orders of magnitude fewer than the ^ 10^^ typically 
found in experimental samples. Such restrictions engen- 
der so-called finite-size effects in the results, ie. spurious 
artifacts and systematic discrepancies (compared to the 
bulk limit) the magnitude of which need to be quantified. 
It should also be borne in mind that irrespective of the 
sophistication of the simulation techniques employed, re- 
sults for macroscopic equilibrium behaviour will reflect 



reality only to the extent that the underlying model cor- 
rectly captures the true microscopic nature of the sub- 
stance under study. As the old computing adage goes: 
rubbish in, rubbish out. 

Simulation strategies for obtaining the equilibrium 
phase behaviour of classical fluid systems come in a pro- 
fusion of different shapes and forms, but broadly speaking 
fall into two main categories: Molecular Dynamics (MD) 
and Monte Carlo (MC). Both have been previously dis- 
cussed extensively in a number of introductory texts and 
articles, see eg. refs. (^-|^- The MD approach involves 
computing the phase space trajectories of a system of 
mutually interacting particles by integrating Newton's 
equations of motions through time. Physical properties 
are measured in terms of configurational or time aver- 
ages as the simulation evolves. MD represents an attrac- 
tive method in situations where one is interested in ob- 
taining dynamical information about a system, and can 
also be employed to obtain single phase thermodynamical 
properties. However for studies of phase transitions, ie. 
the process by which one phase spontaneously transforms 
into another, it is rarely a suitable approach because of 
the problem of hysteresis (superheating and supercool- 
ing) wherein the temperature and pressure at which the 
transition occurs in a simulation may differ significantly 
from that of the bulk system. This matter is described 
in detail in sec. ||. 

Phase transitions (it now seems quite generally agreed) 
are best tackled by Monte Carlo (MC) simulation. Here 
one employs a stochastic Markov process to generate a se- 
quence of equilibrium configurations of the model system 
of interest; physical properties are measured as configura- 
tional averages over the sequence [^-^ . The actual mech- 
anisms by which the system evolves from one equilibrium 
configuration to the next are essentially artificial, so in- 
formation about physical dynamical processes is strictly 
limited. Nevertheless this loss is compensated for by po- 
tential gains in a variety of other areas. Specifically, freed 
from the strictures imposed by physical dynamics, the 
simulator is at liberty to construct any number of elabo- 
rate schemes via which the simulation efficiently explores 
the space of possible model configurations. By so doing it 
becomes possible to bridge time and length scales which 
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cannot be probed using MD. 

In this article we shall focus on one area in which re- 
cently developed MC simulation techniques allow one to 
circumvent an old problem, namely that of accurately 
obtaining the phase behaviour of model fluids. The ap- 
proach we describe is in essence a synthesis of a number 
of existing simulation techniques (developed originally in 
the context of lattice spin models) which, when married 
with a standard grand canonical or constant pressure en- 
semble fluid simulation, provide a powerful and efficient 
route to phase coexistence properties. Before embarking 
on this description, however, it is instructive to review 
some key aspects of phase transition phenomenology, in 
particular the origins of the hysteresis effect which for 
many years plagued simulation studies of phase transi- 
tion and which necessitated the new methodological ad- 
vances. 



II. HYSTERESIS, INTERFACES AND THE FREE 
ENERGY BARRIER 

The phase diagram of a typical simple substance such 
as Argon, is depicted in schematic form in figure 1. De- 
pending on the imposed conditions of temperature and 
pressure, the substance can exist in three phases: solid, 
liquid or gas. The corresponding regions of the phase 
diagram are delineated by phase boundaries at which 
transitions occur from one phase to another. Notable 
features of this phase diagram are the triple point where 
three phase boundary lines meet and the liquid-gas criti- 
cal point which terminates the liquid-gas phase boundary 
and beyond which all distinction between the liquid and 
the gas vanishes. 




FIG. 1. Phase diagram of a simple substance in the pres- 
sure-temperature plane. 

Phase diagrams such as that of fig. Qcan be interpreted 
within the framework of thermodynamics by appeal to 
the concept of free energy. A system in equilibrium will 
always choose its state to be that for which the free en- 
ergy is least Q. Phase boundaries then arise naturally 
as that sets of points in the phase diagram for which 



two phases have the same free energies, being equally 
favoured thermodynamically. In experiments, however, 
a transition from one stable phase to another will not al- 
ways occur exactly on the phase boundary. Instead one 
generally encounters 'overshoot' or 'hysteresis', whereby 
the actual transition point depends on the thermody- 
namic history of the sample. 

The following gedanken experiment will help to explain 
the origin of hysteresis. Imagine we take a purified quan- 
tity of a fluid such as water, place it in a sealed piston- 
cylinder arrangement at constant atmospheric pressure, 
and heat it up very slowly so that it always remains close 
to equilibrium. During this process, the water tempera- 
ture will rise until at some point it boils -transforming 
into steam with a concomitant large and abrupt increase 
in the system volume. This is an example of what is 
called a first order phase transition. If, however, we stop 
adding heat before all the water has vapourised we ob- 
serve coexistence between the liquid and vapour phases 
i.e. a portion of the container will be occupied by the liq- 
uid phase, separated by an interface from the remainder 
which contains vapour (cf. fig. In transforming from 
one phase to another, a system always passes through 
such mixed-phase configurations and it is these that are 
responsible for hysteresis. 




FIG. 2. Photograph of coexisting liquid water and steam 
in a closed container. The denser liquid occupies the lower 
portion of the container, separated by an interface from the 
vapour. 

It transpires that mixed-phase configurations possess a 
higher free energy than pure phase states and since this 
additional free energy is wholly associated with the inter- 
face itself, it is often referred to as the surface tension. 
Owing to their surface tension, mixed-phase configura- 
tions are thermodynamically less favourable than pure 
phase states at the phase boundary. This has a bear- 
ing on phase transitions such as the vaporisation of our 
water sample. As the water is slowly heated it eventu- 
ally attains the phase boundary temperature T = 373.15 
K above which steam has a lower free energy than liq- 
uid water i.e. becomes the thermodynamically favoured 
state. Nevertheless it is possible to 'superheat' liquid wa- 
ter some way beyond this temperature without it trans- 
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forming to steam because, in order to do so, it must pass 
through the mixed-phase states of higher free energy. A 
similar effect occurs when cooling steam -it becomes pos- 
sible to 'supercool' it some way below 373.15 K without 
it liquefying. Hence the temperature at which the tran- 
sition occurs is not that at which the free energies of the 
two phases are equal, but instead depends on the initial 
state of the sample and the direction and rate in which 
the boundary is traversed. 

Much the same thing occurs in simulations of first or- 
der phase transitions. But here the problem is also in- 
timately bound up with issues of finite-size effects and 
simulation timescales. To appreciate how a simulation 
behaves near a first order phase transition, it is instruc- 
tive to consider a concrete example, namely the liquid- 
gas transition of a prototype model fluid -the famous 
Lennard- Jones fluid (LJF). Within this model, the in- 
teraction potential for two point particles separated by 
linear distance r is given by 



where the parameters e and a set the strength of the 
interaction and the length scale respectively 0. 

One way of performing a MC simulation of the LJF is 
to employ the grand-canonical ensemble (GCE) in which, 
for a given system volume V , the total configurational 
energy E and particle number N are permitted to fluc- 
tuate stochastically, but with average values determined 
by prescribed values of the temperature T and chemical 
potential /i. These latter two variables span the phase 
diagram and by tuning their values, transitions can be 
induced between gas, liquid and solid phases (cf. fig. 1). 

An outline of the operation of a GCE MC simulation 
is given in box 1. Fluctuations in the energy and par- 
ticle number occur by means of particle insertions and 
deletions. MC updates consist of either an insertion or 
a deletion attempt, each of which is proposed with equal 
probability. For an insertion, one chooses a random po- 
sition in the simulation box and calculates the energy 
change /S.E^ associated with placing a new particle at 
that position. The trial insertion is accepted with a prob- 
ability given by a so-called Metropolis rule designed to 
ensure that detailed balance is maintained -a necessary 
condition for attaining thermodynamic equilibrium pi: 
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where z = exp(/?^) with /3 = l/ksT. 

Similarly for a particle deletion, one chooses a particle 
at random from those currently present in the system, 
calculates the energy change AE^ associated with its 
removal and then performs the removal with a probability 
again given by a Metropolis rule 
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The function randQ generates a random number uniform in [0, 1 ] . 

Tlie subroutine ENER{) calculates the energy of a particle at a given position. 



Choose deletion or insertion with 

equal probability 

Select a particle for deletion 

Energy of particle 

Metropolis acceptance probability 

Accepted: remove particle 



New particle at random position 
Energy of new particle 
Metropolis acceptance probability 

Accepted: add new particle 



SUBROUTINE GCEMC 

IF (RAND().GT.0.5) THEN 
IF(N.LE.O) RETURN 
PD=I+INT (RAND()*N) 
DE=ENER(X(PD)) 
PROB=N*EXP(BETA*DE)/(Z*V) 
IF (RAND().LT.PROB) THEN 
X(PD)=X(N) 
N=NI 
ENDIF 
ELSE 

XI = RAND()*L 
DE=ENER(XI) 

PROB=Z*V*EXP(BETA*DE)/{N+ 1 ) 
IF (RAND().LT.PROB) THEN 
X(N+1) 
N=N+1 
ENDIF 
ENDIF 
RETURN 
END 



Box 1: Outline operation of a simple GCE simula- 
tion program. For brevity only the operations for the 
X coordinate of particle position vector has been given. 
Analogous operations apply to the y and z coordinates. 



Notwithstanding its simplicity, a GCE simulation of 
this type can provide a highly efhcient computational 
route to equilibrium fluid phase properties. This is es- 
pecially so if, as is customary, the interparticle potential 
U{r) is truncated at some cutoff radius rc Contri- 
butions to AE^ and AE^ then arise only via local in- 
teractions and, by partitioning the simulation box into 
cubic cells of side Tc, the subset of contributing particles 
can be readily identified. Note also, that for studies of 
liquid-gas phase transitions one gains nothing by imple- 
menting explicit particle displacement moves in a GCE 
simulation. Instead these are realized implicitly by virtue 
of repeated particle transfers, sole use of which not only 
constitutes a valid algorithm (it is clearly ergodic) but, 
moreover, focuses the computational effort on the bottle- 
neck for phase space evolution namely the fluctuations in 
the particle number density. 

Let us now examine how the GCE simulation algo- 
rithm of Box 1 behaves in the vicinity of the liquid-gas 
phase boundary. The primary observables of interest 
from such a simulation are the particle number N and 
the configurational energy £'({r}) where {r} denotes the 
set of particle position vectors i.e. the configuration. The 
fluctuations of the particle number density, p = N/V, in 
particular provide much insight into the nature of phase 
coexistence in a finite-sized system. In a simulation, one 
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can accumulate the probability density function (pdf) of 
the number density, p{p), in the form of a histogram av- 
eraged over many independent samples of p. The form 
of such a distribution close to liquid-gas coexistence is 
shown in schematic form in fig. |^. 
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FIG. 3. A schematic diagram of the form of p{p) on the 
liquid-gas phase boundary. 

The principal feature of this distribution is its bimodal 
(double-peaked) character. Each of the two peaks corre- 
sponds to one of the pure phases -the low density peak 
to the gas phase and the high density peak to the liq- 
uid phase. The location of the liquid-gas phase bound- 
ary line in the p — T plane is prescribed formally by the 
equal peak weight criteria i.e. by the set of values of 
p and T for which the integrated weights (areas) under 
the two peaks are equal. Thus to locate liquid-gas co- 
existence in a simulation one must tune p and T until 
the measured form of p{p) is doubly peaked with equal 
area under each peak. The problem for simulations is 
that to obtain accurate estimates for the relative peak 
weights, the simulation procedure must supply bountiful 
independent samples from each of the two phases, which 
in turn necessitates that it pass many times back and 
forth between them. 

Unfortunately, the inter-phase route necessarily tra- 
verses the mixed-phase configurations in which regions 
of both phases coexist within the simulation box. Such 
configurations have, on account of their surface tension, 
an a-priori probability that is intrinsically low compared 
to those of the pure phase states. This, of course, is the 
physical origin of the trough separating the two peaks 
of p{p) which may accordingly be regarded as a "proba- 
bility (or free energy) barrier" to inter-phase transitions 
0. The height of this barrier is taken to be the ratio of 
the maximum (peak) value of p{p) to its minimum value 
in the trough. For large barrier heights, the free energy 
penalty associated with forming mixed-phase configura- 
tions is so high that transitions between the two pure 
phases can become very rare on the timescale accessible 
to simulation. As a consequence, the correlation time 
becomes large, hindering the accumulation of indepen- 
dent statistics on the relative peak weights and thence 



the accurate location of the phase boundary. 

The barrier height (and hence the scale of the difficult) 
depends on the temperature. At the critical temperature 
the surface tension vanishes and with it the probability 
barrier If, however, one follows the liquid-gas bound- 
ary to progressively lower sub-critical temperatures, ther- 
mal fluctuations diminish and the surface tension grows. 
This is manifest in a growth in the barrier height ac- 
companied by a narrowing of the peaks of p{p)- These 
features are illustrated in fig. ^ which shows the mea- 
sured form of p{p) (obtained using the GCE algorithm 
of box 1.) for the LJF close to the phase boundary for 
the two temperatures T = 0.985Tc and T = 0M5Tc. 
Also shown in fig. ^(b) is the corresponding evolution 
of the density expressed as a function of the number of 
Monte Carlo insertion/deletion attempts. For the higher 
temperature, the barrier height is approximately 5 and 
inter-phase transitions are fairly frequent. On decreasing 
the temperature to 0.965Tc, however, the barrier height 
increases to a factor 30 and inter-phase crossings become 
comparatively much rarer. This reluctance to transform 
between the phases is, of course, none other that the 
hystereis phenomenon, viewed not from the standpoint 
of thermodnamics, but in terms of the underlying statis- 
tical mechanics 0. 
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FIG. 4. (a) The form of p{p) at T = O.QSSTc and 
T = 0.965Tc (b) The associated time evolution of the number 
density at = 0.985Tc and (c) T = 0.965rc 

Qualitatively similar effects occur if the linear dimen- 
sion of the simulation box L is increased. The barrier 
height grows because the surface tension of the interface 
increases proportionally to its area for d — 3). Ad- 

ditionally, the peaks oip{p) narrow due to 'self-averaging' 
of fluctuations. Even for quite modest system sizes the 
barrier can be prohibitively large for inter-phase crossings 
to occur on the accessible simulation timescale. 

Effectively then, one is caught between the 'rock' of 
wishing to minimise finite-size effects by employing a 
large simulation box, and the 'hard place' of seeking to 
sample on timescales exceeding the correlation time. Ev- 
idently therefore, a superior approach to bare GCE sim- 
ulation is called for if one is to tackle the problem of 
first order phase transitions at temperatures significantly 
below that of criticality. Indeed, over recent years con- 
siderable effort has been invested in developing new MC 
simulation methods for circumventing the problems iden- 
tified above. In the next sections we describe one solution 
to the problem which is rapidly becoming the method of 
choice for high resolution studies of fluid phase equilibria. 

III. BEATING THE BARRIER 

Approaches for dealing with the free energy barrier 
in simulations of phase transitions fall broadly into two 
categories 

(i) Simulations without interfaces 

(ii) Biased sampling techniques 

Foremost in category (i) are methods such as the Gibbs 
Ensemble Monte Carlo (GEMC) § and Gibbs-Duhem 
integration [|lo|, both of which have enjoyed widespread 
use and popularity in studies of phase transitions in 
model fluids. Although quite distinct in character, both 
these methods serve to link the pure phases thermo- 
dynamically, without traversing mixed-phase configura- 
tions. Both are versatile and fairly easy to use. How- 
ever, they also have significant drawbacks as discussed in 
Sec. ^ Specifically, it seems difficult using the GEMC 
method to obtain coexistence data of high statistical 
quality without a large investment of computational ef- 
fort. By comparison, Gibbs-Duhem integration is more 
efficient, but potentially suffers from integration errors 
rendering it difficult to assess the accuracy of results for 
phase boundary properties. 

More recently, an alternative approach has emerged 
-one which although arguably less straightforward 
to implement, offers the rewards of considerably greater 
efficiency, precision and flexibility than GEMC or Gibbs- 
Duhem integration. This scheme is based on a synthe- 
sis of two existing simulation techniques: Multicanonical 



biased sampling and Histogram Reweighting. In the re- 
mainder of this article, we shall describe how this com- 
bined method operates and set out in a step-by-step fash- 
ion how one goes about implementing it in practice. 

A. Multicanonical Sampling 

Multicanonical MC owes its origin to the biased sam- 
pling techniques flrst introduced in the 1970's by Torrie 
and Valleau to calculate free energies . Latterly how- 
ever, such techniques have gained fresh impetus with the 
realisation that they permit the bridging of the free 
energy barrier at a first order phase boundary. In this 
context the term "Multicanonical sampling" was coined 
and the method applied successfully to the study of phase 
transitions and free energy landscapes in a variety of 
lattice-based spin systems [Q. 

The basic idea underpinning Multicanonical MC is to 
preweight the sampling of configuration space in such a 
way as to artificially enhance the occurrence of the mixed- 
phase configurations of intrinsically low probability. By 
so doing it is possible to overcome the probability barrier 
separating the two pure phases, thereby allowing the sim- 
ulation to pass unhindered between them. The result is 
a great reduction in the correlation time of the sampling 
process. 

Within the GCE framework, the biasing is achieved 
by use of a preweighting function incorporated into the 
Metropolis acceptance criteria for particle insertions and 
deletions (cf. Box 1). Its purpose is to modify (with 
respect to standard Boltzmann statistics) the probabil- 
ity with which configurations of the various densities are 
visited, in such a way that the measured number density 
pdf is approximately flat over the entire density range 
separating the two pure phases. Of course the results of 
such biased simulations deviate from Boltzmann statis- 
tics and consequently lack direct physical significance. 
Nevertheless, it is possible (as we describe below) to un- 
fold from the simulation results the unwanted effects of 
the imposed bias, thence recovering the physically rel- 
evant quantities which would have been obtained in an 
unbiased simulation had sufficient run-time been avail- 
able. In general, however, there is a price to be paid for 
this gain, and that is the expenditure of effort involved 
in finding a suitable form for the preweighting function. 
Fortunately it transpires that for the purposes of tracing 
phase boundaries this is not always necessary, it being 
possible to obtain a suitable preweighting function for 
'free'. 



1. Formalism 

Let us begin by considering the GCE form of the parti- 
cle number pdf, p{N), at inverse temperature (3 — l/fc^T 
and chemical potential fi. For a system of volume V this 
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takes the form of a simple average of the Boltzmann fac- 
tor over all possible particle positions 



p{N\V,(3,^l) 



1 ^ 

in 



dr. 



(3.1) 



where Ti is the configurational Hamiltonian given by 
7i({r}, N) = E{{t}) - nN, while Z = Z{(3, y) is the par- 
tition function which serves to normalise the distribution 
to unit integrated weight. 

The multicanonical method operates by sampling not 
from a simple Boltzmann distribution with Hamiltonian 
7Y({r},iV), but from a modified distribution with effec- 
tive Hamiltonian 



'H = n + T]iN) 



(3.2) 



where ri{N) is a preweighting function defined on the 
set of particle numbers N |15|. The associated particle 
number distribution is given by 
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N 



p{N\V,l3,fi,v{N)) = ^l[ 

z— 1 



(3.3) 



Let us now suppose for the sake of argument that we 
are able to choose the preweighting function such that 
?7(iV) = Inp(iV) where p{N) is the desired Bol tzmann 
density distribution. Inspection of eqs. ^?l|-|3^ reveals 
that this implies p{N) = constant V N. To the extent 
that such a choice of weight function can actually be re- 
alised, the density then performs a 1-d random walk over 
its entire domain, thereby allowing extremely efficient ac- 
cumulation of the preweighted histogram p{N). 

Unfortunately, this happy state of affairs cannot in 
general be immediately achieved because the preweight- 
ing function ri{N) = \np{N) that serves to flatten p{N) 
is, of course, just the logarithm of the function we are 
trying to find! Means must therefore be found to obtain 
a form for r]{N) that approximates liip{N) sufficiently 
well that inter-phase transitions occur with an accept- 
ably high frequency. More refined forms for r]{N) can 
thereafter be obtained in the further course of the study. 

Let us assume that a suitable preweighting function 
has been found, and a simulation performed to obtain 
good statistics for p{N). The next step is to infer the dis- 
tribution p{N) we actually seek by unfolding the effects 
of the multicanonical preweighting. This is achievable 
because knowledge of the preweighting function tells us 
exactly by what degree the relative probabilities of the 
states of various N were altered in the simulation with 
respect to the true Boltzmann statistics. One therefore 
need only divide out the relative probability enhance- 
ments from p{N) to yield p{N). This is done for each 
value of N in the range of interest by the simple reweight- 



mg: 



p(iV|F,/3,M) = e''W p{N\f3,ti,r,{N)) 



(3.4) 



The details of the practica l impl ementation of this pro- 
cedure are described in Sec. HIC, 



B. Histogram reweighting 

Histogram Reweighting is the second ingredient in our 
simulation procedure. It rests upon the observation 
that histograms of observables accumulated at one set of 
model parameters (in our case /3 and /x) can be analysed 
to provide estimates of histograms appropriate to other 
values of these parameters. Consider the joint probability 
distribution of energy and particle number fluctuations 
at some particular parameter values f3 — f3o and fj, — fiQ. 
Formally this is given by 

piN,E I y,/3o,/io) 

= ^ n {X ' mr^}))e-^"''" , (3.5) 

where Hodr}, N) = E{{r}) + fioN. It is easy to show 
|l6| that an estimate for the form of p{N, E) at some 
other parameters /3 = /3i, /i = /ii can be obtained from 
the measured pdf by the simple reweighting: 



p(iV,i?|F,/3i,/ii) 



Zi 



(3.6) 



where the ratio Zi /Zq is an unimportant constant which 
is effectively absorbed into the normalisation. If desired, 
this joint distribution can then be marginalised to yield 
a uni-variable distribution, eg. the particle number pdf 
at 

p{N I F,/3i,Mi) = j dEp{N,E\ I/,/3i,Mi) • (3.7) 

Hence, in principle at least, a single simulation at one 
state point in the phase diagram suffices to obtain infor- 
mation concerning all other state points. Unfortunately 
the reality of the situation is less auspicious. Owing 
to finite sampling time, it is not possible in practice to 
reweight a single histogram obtained from a simulation 
at some /3o;Mo to arbitrary values of Pi,fj,i. Instead the 
parameters to which one extrapolates must be close (in 
a sense we shall describe) to those at which the simu- 
lation was actually performed, otherwise the procedure 
loses accuracy. 

The problem is traceable to the fact that the reweight- 
ing represented by eq. |3.6| may drastically modify the 
relative statistical weights of the various members of the 
set of configurations that contribute to the spectrum of 
measurements. Specifically, difficulties arise with that 
subset of sampled configurations having very low Boltz- 
mann weight at the simulation parameters /3o,/^Oj mem- 
bers of which consequently occur only very rarely in the 
sample. For expectation values of observables calculated 
at Po,fio, contributions from this rare subset of configu- 
rations do not contribute disproportionately to statistical 
uncertainties. However, under the reweighting to 
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the configurations in question may be assigned a much 
greater statistical weight, one which does not reflect their 
actual representation in the overall sample. This has the 
efi^ect of magnifying the overall statistical error on mea- 
sured expectation values and is manifest as a reweighted 
histogram that appears 'ragged' in its extremal regions. 

One way of dealing with this problem is to perform 
a sequence of separate simulations at strategic intervals 
across the range of model parameters of interest. Typi- 
cally the intervals are chosen such that histogram of some 
observable (eg. the energy) accumulated at neighbouring 
state points in the sequence overlap within some region 
of their domain. The role of Histogram Reweighting to 
then to interpolate into the regions of parameter space 
between the simulation points. In this context it should 
be noted that it is possible to combine (in a self consistent 
fashion) the results of a number of different simulations 
at different model parameters and perform Histogram 
Reweighting on the aggregate data. For a description 
of this more sophisticated procedure we refer the reader 
to refs. (l|J|. 



but to repeat the whole simulation. A superior approach, 
retaining complete information, involves decoupling the 
data analysis from the simulation by recording the full 
history of raw data measurements. This data is then 
postprocessed by a separate analysis program. Although 
such an approach can make for large simulation output 
files, it has the overriding advantage of ensuring maxi- 
mum fiexibility in terms of data analysis. 

To facilitate the post processing approach, the raw 
data should be accumulated in the form of a list. Sup- 
pose we perform a GCE MC simulation of the LJF at a 
given [3 = I3q and ii = hq, employing some chosen weight 
function r]{N). As the simulation proceeds we make a 
succession of measurements (performed at regular spaced 
intervals of time) of the observables E and N, together 
with any other quantities of interest. Successive mea- 
surements of these observables are gathered into a list by 
appending them to a file, viz: 

{Eo,No,Oo,..} 
{E,,N,,Oi,..} 

{E2,N2,02,..} 



C. Stitching together the pieces 

So how does one implement the above formalism in a 
GCE simulation of a fluid? In fact the task falls naturally 
into two parts: performing the actual multicanonically 
preweighted simulation and the subsequent data analysis. 
We consider them in turn. 

The business of implementing the multicanonical 
preweighting within a GCE simulation is basically quite 
straightforward. Assuming an appropriate set of multi- 
canonical weights has been found, all one need do is read 
them in and store them in an array. The simulation then 
proceeds as outlined in Box 1., except that the Metropolis 
acceptance probabilities for insertion and deletion (eqns. 
O and ^.21) are modified to read: 



1, 



zV 



r]{N + l) (N + l) 



P.cc{N -> iV - 1) 



r/{N) N 
r]{N - \)'zV 



exp(-/3(A£;^)) 



(3.8) 



(3.9) 



Consider now the simulation quantities we seek to ob- 
tain, namely the probability distributions of N ^ E and 
any other observables of interest. Obviously it is tempt- 
ing to accumulate these distributions in the form of his- 
tograms built up in the course of the simulation by sim- 
ply bining successive measurements into an array. But 
for continuous variables such as the energy, this strategy 
necessitates a prior choice for the histogram bin-width. 
Should a choice be made that subsequently turns out to 
be unsatisfactory, then one is faced with little alternative 



{£;„iv„o„..} 



{Em^ NmtOm, ..} 

where j indexes the series of M + 1 measurements and 
O denotes an observable of interest. The data list is 
analysed following the actual simulation by a separate 
post-processing program, the task of which is three fold. 
Firstly it should remove from the data the unwanted ef- 
fects of the preweighting in order to recover the desired 
Boltzmann distributed statistical properties. Secondly, 
it should output pdf's of the observables of interest in 
the form of histograms. Thirdly, it should (if desired) 
reweight the data to obtain estimates of histograms ap- 
propriate to parameter values different from those at 
which the simulation was performed. 

It turns out that the operations of histogram reweight- 
ing and bias removal can be accomplished simultaneously 
because their mathematical structures are very similar. 
To do this, one simply runs through the data list assign- 
ing each entry j a statistical weight 



A-Wi~MEj + (l3tti.t~l3alia)Nj+r,(Nj)] 



(3.10) 



where f3i , /^i are the parameters to which one wishes 
to extrapolate. The complete set of M -{- 1 weights 
wo,wi,,,wm is then used to construct the reweighted 
histogram for some measured observable of interest O: 

AI 

H{0 \V, A, Ml) = "'^^(O - O^)- (3.11) 

j=o 

This histogram, once suitably normalised, constitutes 
a discrete estimate for the probability distribution 
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p{0' I /3i,/xi). Implicit in its construction is the spec- 
ification of the bin-width, which may need to be tuned 
to strike a balance between resolution and data smooth- 
ness. But with the raw list safely stored in a file, this is 
something that can be done very quickly. 



IV. TRACING COEXISTENCE CURVES 

We now turn to the actual simulation procedure by 
which a fluid phase boundary can be determined. For 
illustrative purposes, we shall remain with our prototype 
example, the liquid-gas transition of the LJF. The essen- 
tial approach is nevertheless rather general and can be 
applied to other types of phase transition such as dcmix- 
ing transitions in fluid mixtures. 

Clearly one of the key components of our procedure 
is multicanonical preweighting, use of which generally 
entails a degree of preliminary effort in order to deter- 
mine a suitable preweighting function. It transpires, how- 
ever, that provided one begins tracing the phase bound- 
ary from near the critical point which marks its termi- 
nus, no additional effort need be expended determining 
preweighting functions. Instead these can be obtained 
for 'free' by virtue of histogram reweighting as we shall 
now describe. 

One starts off by gauging the approximate location 
of the critical point from a series of short runs on a 
small system. This is not time consuming and can be 
performed interactively at the computer, provided one 
arranges that the data list (see section [II Cj ) is deliv- 
ered directly to the screen. One starts by nominating a 
temperature T and a large negative value of /i at which 
a short simulation is performed without multicanonical 
preweighting. The fluctuating number density will typi- 
cally settle down quite rapidly, and its average value can 
be estimated visually from the data output. One then 
repeats this procedure for a succession of progressively 
larger ^ values. As ^ is increased, the average particle 
number will be observed to increase steadily. However on 
traversing the hysteresis-shifted phase boundary, a sud- 
den jump will occur in the density. If this jump is large, 
eg. for the LJ fluid, from p = 0.05 to p = 0.6 (in units 



of cr~ ), then the temperature is well below the critical 
temperature and one should increase T somewhat and 
begin again. If the jump is somewhat smaller e.g. from 
p ~ 0.2 to ~ 0.4, then one has obtained an estimate for 
a near-critical point on the phase boundary fi^{P). Of 
course, if no jump in density is observed at all then the 
temperature probably exceeds the critical temperature 
and should be reduced. 

One next performs a longer run for some larger system 
of interest at the estimated near-critical phase bound- 
ary point, let us call it fJ-a-if^o)- Since the surface tension 
and the associated barrier to inter-phase crossings are 
low near criticality, it should be possible to accumulate 
an accurate form for p{N, E) (including information on 



states 'between the peaks') without resort to multicanon- 
ical preweighting. Having done this, the next step is to 
reweight the data accumulated from this run to obtain 
an estimate of the form of p{N) at some lower tempera- 
ture point on the phase boundary [Q. This is achieved 
by first choosing an extrapolation temperature /3i inside 
the range of reliable reweighting (so that the reweighted 
distribution Pea; (A^ | /3i,/ii) appears smooth). One then 
tunes p within the histogram reweighting scheme until 
Pex{N I Pi,pi) is bimodal with equal area under each 
peak. This tuning procedure can be easily automated 
within the analysis program to deliver precise values of 
the phase boundary chemical potential pi = pa-ifii)- 

The reweighted phase boundary histogram 
Pex{N I (3i,pi) will, on account of the lower tempera- 
ture, be more strongly peaked than that from which it 
derives. Thus multicanonical preweighting will probably 
be necessary for a new simulation at (3i, pi. Fortunately 
however, a suitable preweighting function is already to 
hand -it is just the extrapolated function pe^, (A^ | (3i,pi). 
Thus all one need do is set ?7(A^) — Pex{N \ Pi^pi) and 
perform a multicanonical simulation at Pi,pi to obtain 
the actual distribution p{N^ E \ f3i, pi). 

One then simply iterates this procedure: histogram 
reweighting oip{N, E \ f3i,pi) is used to estimate a phase 
boundary point p2 = Pai/32) at temperature P2, together 
with the extrapolated distribution Pex{N \ P2,P2)- The 
latter serves as a preweighting function for a further sim- 
ulation at P2tP2, and so on. In this way one steps down 
the coexistence curve, obtaining at the same time the lo- 
cus of the phase boundary pa- (/3) and the associated set 
of number density pdf's. Clearly the maximum feasible 
step size is set by the range of reliable histogram extrapo- 
lation, which does decreases with increasing system size. 
In practice, however, quite large strides can be made even 
for large systems. For example, in reference [0, a sys- 
tem of approximately 600 LJ particles was studied and 7 
steps were required to reach a subcritical temperature of 
T = OMTc. 

The results of implementing this procedure for the LJF 
are depicted in fig. ||. The forms for p{N) shown have 
equal weight in each peak and hence lie on the phase 
boundary. The associated phase diagram p^ {(3) is shown 
in figure ^ The values of the coexisting densities can 
be simply read off from the peak positions in fig. ^(a). 
The enormity of the probability barrier that multicanon- 
ical preweighting allows one to negotiate is revealed by 
plotting these distributions on a log scale, fig. §(b). 



8 



25.0 




10^ 



10" 



(b) 



10"' 



10 



10 



10" 




0.4 
P 



FIG. 5. (a) A selection of the measured forms of p(p) on the 
phase boundary, for temperatures in the range 0.95 <T < Tc. 
(b) The same data expressed on a log scale. 
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FIG. 6. The line of liquid-vapour phase coexistence in 
the space of /i and T, for temperatures in the range 
0.95 < T < Tc. Also shown is the location of the critical 
point 10 



Finally in this section we point out that the measured 
coexistence form of the density distribution p(p) permits 
an estimate of the surface tension 7. For a cubic system 



of volume L^, this is found simply from the ratio of values 
of p{p) at the peak and at the trough between the peaks 
0: 



1 



2/3L' 



In 



Pn 



Pn 



Reference [|l7j describes a recent application of this re- 
lation in a study of the surface tension of the Lennard- 
Jones fluid. 



V. DISCUSSION AND CONCLUSIONS 

To summarise, we have described a scheme whereby 
information on the locus of a fluid phase boundary is ob- 
tainable via the dual mean of multicanonical preweight- 
ing and histogram reweighting, married to a standard 
grand canonical simulation algorithm. The method be- 
gins tracing the phase boundary from near the critical 
point where the free energy barrier to inter-phase tran- 
sitions is small. Histogram reweighting of the data thus 
obtained is used to provide an estimate of the location 
of the phase boundary at some lower temperature, to- 
gether with a suitable form of the requisite preweighting 
function. A new multicanonical simulation is performed 
at this lower temperature phase boundary point and the 
process is repeated. In this way it is possible to stride 
down the phase boundary, obtaining pdfs of observables 
such as the number density from which, in turn, the co- 
existence properties can be inferred. Such use of multi- 
canonical preweighting completely eliminates hysteresis 
at first order phase transition. 

In cases where one doesn't wish to start tracing a coex- 
istence curve from near the critical point, it is necessary 
to bootstrap the procedure described above by obtaining 
an initial phase boundary preweighting function. A va- 
riety of techniques exist for achieving this, ranging from 
simple extrapolation of the weight function into the un- 
sampled region, to more sophisticated analyses of MC 
transition probabilities. Most of the techniques in com- 
mon use can be straightforwardly automated. It is be- 
yond the scope of this article to describe these methods 
in detail, and for further details we refer the interested 
reader to the literature pc| , p^ . 

Although we have illustrated our approach solely in 
the context of a simple fluid model, it should be noted 
that it is equally applicable to complex fluids such as 
molecules or polymers. In these systems, however, the 
GCE insertion probability is often small, so it is neces- 
sary to supplement the standard algorithm with a more 
intelligent insertion scheme -one which performs a biased 
choice of a molecular orientation favourable for the inser- 
tion. Methods such as Configurational Bias Monte Carlo 
pl[ and Recoil Growth ||2^ allow one to do this. Apart 
from this added complication, multicanonical preweight- 
ing and histogram reweighting are implemented exactly 
as for a simple fluid. 



9 



The scheme we have described is generahsable to other 
simulation ensembles such as the constant- NpT ensemble 
in which density fluctuations occur by means of volume 
changes at constant particle number, pressure and tem- 
perature. Use of this ensemble can be more efhcient than 
the GCE when dealing with very dense fluids where the 
success rate of particle insertions and deletions is small. 
The natural variable in which to preweight a constant- 
NpT ensemble simulation is the fluctuating volume, but 
otherwise the formalism is very similar to that described 
above. For other types of phase transitions, such as the 
liquid-liquid transitions occurring in binary fluid mix- 
tures, a suitable variable in which to preweight is usually 
the order parameter for the transition eg. the concentra- 
tion of one species. 

It is instructive to compare the scheme described in 
this article with other commonly used methods for trac- 
ing phase boundaries. One such method is the Gibbs- 
Duhem integration method wherein pairs of single phase 
simulations (one for each phase) are performed at vari- 
ous state points along the phase boundary. The single 
phase results are connected thermodynamically, not by 
negotiating the free energy barrier at each state point, 
but via a phase space path that runs back along each 
side of the phase boundary to some independently-known 
reference point on the phase boundary. Successive sim- 
ulation state points along the phase boundary are found 
from an integration scheme which turns out |23| to be 
a low order approximation to the histogram reweighting 
method. The Gibbs-Duhem method is versatile provided 
one has prior knowledge of a phase boundary reference 
point and is, for the purposes of obtaining a rough esti- 
mate of the phase boundary locus, doubtless faster than 
the method described in this article. However, its ac- 
curacy rests heavily upon the precision with which the 
boundary reference point is known. Moreover, as one 
departs from this point, integration errors can grow and 
provided one remains within the (rather wide) region of 
metastability bordering either side of the phase bound- 
ary line, there is no feedback to indicate that one has in 
fact strayed away from it. By contrast, the multicanon- 
ical method is self-correcting because it reconnects the 
two phases at each successive simulation state point on 
the phase boundary. 

Another widely used technique is the Gibbs Ensemble 
MC Method |^ . Two separate simulation boxes (one for 
each pure phase) are connected thermodynamically by 
the dual means of particle transfers between the boxes 
and fluctuations in the box volumes implemented such 
that the overall volume of the two boxes remains con- 
stant. Like Gibbs-Duhem integration, GEMC is very 
useful for obtaining rough estimates of phase bound- 
aries. The method is free of integration errors since 
the phases remain directly linked at each phase bound- 
ary state point. However, measurements of the pressure 
and chemical potential must be obtained by a sampling 
scheme since neither is imposed in the simulation. More- 
over, direct comparisons have shown GEMC to be much 



less efficient than the methods described in this article 
because of the computational complexity associated with 
implementing volume fluctuations [^,^. Worries have 
also recently been voiced that the GEMC method suffers 
more severely from finite size effects than does the GCE 

Finally, it should be pointed out that despite their 
utility in dealing with phase equilibria involving fluids, 
the specific multicanonical techniques we have described 
do not permit one to tackle phase transitions involving 
solids. The problem here is that when attempting to tra- 
verse mixed-phase states involving crystalline order, the 
simulation invariably gets caught in 'non-ergodic traps' 
identifiable with defective crystalline configurations. Re- 
cently, however, a new technique has been developed that 
circumvents this problem by linking the two coexisting 
phases without traversing mixed-phase states. Essentially 
the method can be thought of as leaping directly from 
the configuration space of one pure phase to that of the 
other. Again use of multicanonical sampling is necessary, 
but this time its role is to encourage the simulation to 
visit a subset of configurations (in each pure phase) from 
which a leap to the other phase will be accepted. This 
new method has recently shown its worth in studies of 
solid-phase free energy differences and hard sphere freez- 

ing mM- 



VI. SUGGESTIONS FOR FURTHER STUDY 

The following set of problems will help to reader to 
become more familiar with the techniques described in 
this article. 



1. 



Using Box 1. as a guide, write a simple Grand 
Canonical ensemble MC program to simulate the 
Lennard- Jones fluid with potential cut-off at Vc — 
2.5(7. Do not correct for the potential truncation. 
Make sure the program prints out the particle num- 
ber and the energy in list form (cf. sec. [H C). Fur- 



ther programming details can be found in ref. [Q. 

2. Run the program at the near critical phase bound- 
ary parameters /3e = 1.1876, (3^1 = -2.778, 
saving the output data list to a file. 

3. Write a post-processing program to construct the 
number density histogram p{N) from the raw data 
list. 

4. Modify your GCE acceptance probabilitie s to c ater 
for m ulticanonical preweighting (cf. sees. HI A and 
inC). Use your measured p{N) as the preweight- 
ing function for a multicanonical simulation at the 
same /3, ^ used in (2) above. Hint: At the extrema 
of small (large) TV, there will be histogram bins 
in p{N) having zero entries. Before using p{N) as 
your preweighting function, set these entries to be a 
constant equal to the smallest non zero entry. This 
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avoids possible division by zero in the acceptance 
probabilities. 

5. Extend your post-processing program to unfold 
the effects of the p reweighting (as described in 
Sees. plA| and |mc]) in order to find p{N). Check 
that the form of p{N) thus obtained agrees with 
that found without multicanonical preweighting. 
Compare the correlation time for the sampling pro- 
cesses with and without multicanonical preweight- 
ing. 

6. Further extend your post-processing program to 
implement histogram reweighting (as described in 
Sec. IIIB and [II C). Extrapolate the data ob- 
tained at the critical temperature /3e — 1.1876 to 
find the location of the phase boundary and the 
form oi p{N) at /3e — 1.17. Use this extrapolation 
as a preweighting function in a new multicanonical 
simulation at the new phase boundary state point. 
Compare your results with those of ref. ||l|] 
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